The SoVideo Mandarin Chinese Broadcast News Retrieval System
نویسندگان
چکیده
This paper describes the SoVideo broadcast news retrieval system for Mandarin Chinese. The system is based on technologies such as large-vocabulary continuous speech recognition for Mandarin Chinese, automatic story segmentation, and information retrieval. Currently, the database consists of 177 hours of broadcast news, which yields 3264 stories by automatic story segmentation. We discuss the development of the retrieval system, and the evaluation of each component and the retrieval system.
منابع مشابه
Speech retrieval of Mandarin broadcast news via mobile devices
This paper presents a system for speech retrieval of Mandarin broadcast news. First, several data-driven and unsupervised approaches are integrated into the broadcast news transcription system to improve the speech recognition accuracy and efficiency. Then, a multi-scale indexing paradigm for broadcast news retrieval is proposed to make use of the special structural properties of the Chinese la...
متن کاملMandarin Chinese Broadcast News Retrieval and Summarization Using Probabilistic Generative Models
This paper presents our recent research work on applying probabilistic generative models to Mandarin Chinese broadcast news retrieval and summarization. Most models can be trained in either a supervised or unsupervised manner. In addition, both literal term matching and concept matching strategies have been intensively investigated. This paper also presents a prototype web-based Mandarin Chines...
متن کاملSpeech Retrieval of Mandarin Broadcas
This paper presents a system for speech retrieval of Mandarin broadcast news. First, several data-driven and unsupervised approaches are integrated into the broadcast news transcription system to improve the speech recognition accuracy and efficiency. Then, a multi-scale indexing paradigm for broadcast news retrieval is proposed to make use of the special structural properties of the Chinese la...
متن کاملVoice retrieval of Mandarin broadcast news speech
This paper presents an improved framework for voice retrieval of Mandarin broadcast news speech. First, several unsupervised and data-driven approaches for broadcast news transcription were proposed to improve the speech recognition accuracy and efficiency. Then, a multiscale indexing paradigm for broadcast news retrieval was exploited to alleviate the problems caused by the speech recognition ...
متن کاملExperiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese
Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multi-media collections in the near future. Considering the characteristics and monosyllabic structure of the Chinese language, the syllable-based indexing for retrieval of spoken documents in Mandarin Chinese has been investigated, and extensive experiments on retrieval...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- I. J. Speech Technology
دوره 7 شماره
صفحات -
تاریخ انتشار 2004